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Abstract 

Two maximization problems of Renyi entropy rate are investigated: 
the maximization over all stochastic processes whose marginals satisfy 
a linear constraint, and the Burg-like maximization over all stochas¬ 
tic processes whose autocovariance function begins with some given 
values. The solutions are related to the solutions to the analogous 
maximization problems of Shannon entropy rate. 

Keywords: Renyi entropy, Renyi entropy rate, entropy rate, maximiza¬ 
tion, Burg’s Theorem. 


1 Introduction 

Motivated by recent results providing an operational meaning to Renyi en¬ 
tropy [X], we study the maximization of the Renyi entropy rate (or “Renyi 
rate”) over the class of stochastic processes {Z k } k£ % that satisfy 

Pr [Z k e S] = 1, E [r{Z k )\ < T, k e Z, (1) 

where S C R is some given support set, r(-) is some cost function, T 6 1 is 
some maximal-allowed average cost, and R and Z denote the reals and the 
integers respectively. 

If instead of Renyi rate we had maximized the Shannon rate, we could 
have limited ourselves to memoryless processes, because the Shannon entropy 
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of a random vector is upper-bounded by the sum of the Shannon entropies 
of its components, and this upper bound is tight when the components are 
independently] But this bound does not hold for Renyi entropy: the Renyi 
entropy of a vector with dependent components can exceed the sum of the 
Renyi entropies of its components. Consequently, the solution to the maxi¬ 
mization of the Renyi rate subject to (HD is typically not memoryless. This 
maximum and the structure of the stochastic processes that approach it is 
the subject of this paper. 

Another class of stochastic processes that we shall consider is related to 
Burg’s work on spectral estimation 0 . 0 Theorem 12.6.1]. It comprises all 
(one-sided) stochastic processes (Ab}; e N that, for some given a 0 ,.. ., a v G M, 
satisfy 

E [XiX i+k \ = a k , (ieN, k G (0,... ,p}), (2) 

where N denotes the positive integers. While Burg studied the maximum 
over this class of the Shannon rate, we will study the maximum of the Renyi 
rate. 

We emphasize that our focus here is on the maximization of Renyi rate 
and not entropy. The latter is studied in 0, 0, 0, and [7]. 

To describe our results we need some definitions. The order-ct Renyi 
entropy of a probability density function (PDF) / is defined as 

M/) = T~— lo g [ f a (x)dx, (3) 

1 - a J 

where a can be any positive number other than one. The integrand is non¬ 
negative, so the integral on the RHS of ([3D always exists, possibly taking on 
the value +oo, in which case we define h a (f) as +oo if 0 < a < 1 and as 
—oo if a > 1. With this convention the Renyi entropy always exists and 

M/) > -oo, 0 < a < 1, (4) 

h a (f) < +oo, a > 1. (5) 

When a random variable (RV) X is of density fx we sometimes write h a (X) 
instead of h a (fx)- The Renyi entropy of some multivariate densities are 
computed in [S]. 

1 Throughout this paper “Shannon entropy” refers to differential Shannon entropy. 
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If the support of / is contained in S, then 


h a (f) < log |S|, (a > 0, a ^ l), (6) 

where |M| denotes the Lebesgue measure of the set A , and where we interpret 
log |«S| as —oo when |«5| is infinite. (Throughout this paper we define logoo = 
oo and logO = — oo.) 

The Renyi entropy is closely related to the Shannon entropy: 

/ OO 

f(x)\ogf(x)dx. (7) 

-OO 

(The integral on the RHS of (J7J) need not exist. If it does not, then we say 
that h(f) does not exist.) Depending on whether a is smaller or larger than 
one, the Renyi entropy can be larger or smaller than the Shannon entropy. 
Indeed, if / is of Shannon entropy h(f) (possibly +oo), then by [0] Lemma 5.1 
(iv)]: 


M/) < h(f ), for a > 1; (8) 

K{f) > h(f), for 0 < a < 1. (9) 

Moreover, under some mild technical conditions j9j Lemma 5.1 (ii)]: 

lim h a (f) — h(f). (10) 

The order-a Renyi rate h a ({X k }) of a stochastic process (SP) { X k } is 
defined as 

h a ({X k })= lim -h a (X?) (11) 

7i—»■ oo n 

whenever the limit existsU Here X{ denotes the tuple (X, : , ..., Xj). 

Notice that if each X k takes value in S, then X™ takes value in S n , and 
it then follows from (EJ) that h a (X™) < logl^l” and thus 

K({X k }) <log|S|. (12) 

2 We say that the limit exists and is equal to +oo if for every M > 0 there exists some 
no such that for all n > no the Renyi entropy h a (X i,..., X n ) exceeds ?iM, possibly by 
being +oo. 
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Another upper bound on h a ({X k }), one that is valid for a > 1, can be 
obtained by noting that when «>lwe can use (JSJ) to obtain 

MAT) < h(X?) (13) 

n 

<£*«, (14) 

i=1 

and thus, by f[T3j) . 

K({X k }) < h({X k }), a > 1, (15) 

whenever both h a ({X k }) and the Shannon rate h({X k }) exist. 

The Renyi rate of finite-state Markov chains was computed by Rached, 
Alajaji, and Campbell [[TO] with extensions to countable state space in HU- 
The Renyi rate of stationary Gaussian processes was found by Golshani and 
Pasha in [12]. Extensions are explored in [13]. 


2 Main Results 

We discuss the constraints ([I]) and ([2]) separately. The proofs pertaining to 
the former are in Section [4] and to the latter in Section [5] 

2.1 Max Renyi Rate Subject to (OD) 

Let h*(T) denote the supremum of h(fx) over all densities fx under which 

Pr(X G S) = 1 and E[r(X)] < T. (16) 

Here and throughout the supremum should be interpreted as — oo whenever 
the maximization is over an empty set. Thus, if no distribution satisfies ([15]) . 
then h*(T) is — oo. 

We shall assume that for some r 0 G M 


h*( r 0 ) > -OO, 

(17a) 

h*( r) < oo for every F > r 0 . 

(17b) 


Under this assumption the function h* has the following properties: 
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Proposition 1. Let T 0 satisfy (1T71) . Then over the interval [r 0 ,oo) the 
function h*(-) is finite, nondecreasing, and concave. It is continuous over 
(T 0 , oo), and 

lim h*(T) = log |«S|. (18) 

r->oo 

Proof. Monotonicity is immediate from the definition because increasing T 
enlarges the set of densities that satisfy (fTBD . Concavity follows from the 
concavity of Shannon entropy, and continuity follows from concavity. It 
remains to establish (fT8lh To this end we first argue that for every T, 

h*(T) < log |«S|. (19) 

When |«5| is infinite this is trivial, and when |«S| is finite this follows by noting 
that h*{T) cannot exceed the maximum of the Shannon entropy in the ab¬ 
sence of cost constraints, and the latter is achieved by a uniform distribution 
on S and is equal to log |«5|. In view of (HU . our claim (1151) will follow once 
we establish that 

lim h*( r) > log |«S|, (20) 

T—>-oo 

which is what we set out to prove next. 

We first note that for every T 6 R 

h*(T) > log |{sg5: r(x) < T}| (21) 

because when the RHS is finite it can be achieve by a uniform distribution on 
the set {x G 5: r(x ) < T}, a distribution under which (1T61) clearly holds, and 
when it is infinite, it can be approached by uniform distributions on ever- 
increasing compact subsets of this set. We next note that, by the Monotone 
Convergence Theorem (MCT), 

lim |{s6 5: r(x) < T}| = |«5|. (22) 

r->-oo 

Combining (ITT)) and (1221) establishes (12UD and hence completes the proof 
of dH]). □ 

For a > 1 we note that (HU), (HU), and the definition of h*(T) imply that 
for every SP {Zk} satisfying (H|) 

K{{Z k }) < h*(T), a> 1, (23) 
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and consequently, 


sup h a ({Z k }) <h*(T), a > 1, 


(24) 


where the supremum is over all SPs satisfying (P). Perhaps surprisingly, this 
bound is tight: 

Theorem 2 (Max Renyi Rate for a > 1). Suppose that a > 1, and that 
r > r 0; where r 0 satisfies (1171) . Then for every e > 0 there exists a stationary 
SP {Z k } satisfying P) whose Renyi rate is defined and exceeds h*(T) — e. 

For 0 < a < 1 we can use (1171) to obtain for the same supremum 


sup h a ({Z k }) < log|«S|, 0 < a < 1. 


(25) 


This seemingly crude bound is tight: 

Theorem 3 (Max Renyi Rate for 0 < a < 1). Suppose that 0 < a < 1 and 
that T > r 0 , where r 0 satisfies (1171) . 

• If \S\ = oo, then for every Mel there exists a stationary SP {Z k } 
satisfying (P) whose Renyi rate is defined and exceeds M. 

• If \S\ < oo, then for every e > 0 there exists a stationary SP {Z k } 
satisfying (P) whose Renyi rate is defined and exceeds log |«5| — e. 

Remark 4. Theorems\B and\i | can be generalized in a straightforward fashion 
to account for multiple constraints: 


E[ri(Z k )] < T,, i — 1,... ,m. 


(26) 


However, for ease of presentation we focus on the case of a single constraint. 

A special case of Theorems [2] and [3] is when the cost is quadratic, i.e., 
r(x) = x 2 and where there are no restrictions on the support, i.e., S = R. 
In this case we can slightly strengthen the results of the above theorems: 
When we consider the proofs of these theorems for this case, we see that 
the proposed distributions are isotropic. We can thus establish that the 
constructed SP is centered and uncorrelated: 

Proposition 5 (Renyi Rate under a Second-Moment Constraint). 


6 


1. For every a > 1, every a > 0, and every i > 0 there exists a centered 
stationary SP {Y k } whose Renyi rate exceeds | log(27recr 2 ) — e and that 
satisfies 

E [Y k Y k ,] = a 2 l{k = k'}. (27) 

2. For every 0 < a < 1, every a > 0, and every M e M there exists 
a centered stationary SP {Y k } whose Renyi rate exceeds M and that 
satisfies (ETD- 

This proposition will be the key to the proof of Theorem |U] ahead. 

2.2 Max Renyi Rate Subject to (121) 

Given «o,..., 6 1, consider the family of all stochastic processes Ad, A 2 ,... 

satisfying (|2|). Assume that the (p+1) x (p+1) matrix whose Row-£ Column- 
m element is is positive definite. Under this assumption we have: 

Theorem 6. The supremum of the order-a Renyi rate over all stochastic 
processes satisfying (J2]) is +oo for 0 < a < 1 and is equal to the Shannon 
rate of the p-th order Gauss-Markov process for a > 1. 

3 Preliminaries 

3.1 Weak Typicality 

Given a density / on S of finite Shannon entropy 

— oo < h(f) < oo, (28) 

a positive integer n, and some e > 0, we follow [3], Section 8.2] and denote 
by 7 n(f) th e se t of e-weakly-typical sequences of length n with respect to /: 

r:(f) 

n 

x™ E S n : 2~ nW)+e) < Y[f(x k ) < 2~ nW) ~ £) 

k =i 

(29) 

By the AEP, if Ah...., X n are drawn 1ID according to some such /, then 
the probability of (Ad,..., X n ) being in 7y(/) tends to 1 as n —» oo (with e 
held fixed) [2, Theorem 8.2.2], 
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Given some measurable function r: S —> M, some density / that is sup¬ 
ported on S and that satisfies 


f(x) |r(x)| dx < oo, 


(30) 


and given some neN and e > 0, we define 

1 


Gntf) = \x n 1 eS 


n 


^^r(xk) — / f(x)r(x)dx 


k =i 


< £ 


(31) 


By the Law of Large Numbers (LLN), if Ad,, X n are drawn IID according 
to some density / that satisfies the above conditions, then the probability of 
(Ad,..., A" n ) being in £?£(/) tends to 1 as n —> oo (with e held fixed). 

From the above observations on 7y(/) and Gn(f ) we conclude that if 
Ad,...,A" n are drawn IID according to some density / that is supported 
by S and that satisfies (125|) and fISUlh then the probability of (Ad,... ,X n ) 
being in the intersection 7^(f) fl Gnif) tends to 1 as n —> oo. Thus, for all 
sufficiently large n, 


l-e< 



f(x k ) dx n 


<|-W)n®/)| 2 -"W>-y 


where the second inequality holds by ff2U]) . 

We thus conclude that if the support of / is contained in S, the expecta¬ 
tion of |r(A)| under / is finite, and h(f) is dehned and is finite, then 

IW)n®/)|>(l-£)2”“("->, n large. (32) 


3.2 On the Renyi Entropy of Mixtures 

The following lemma provides a lower bound on the Renyi entropy of a 
mixture of densities in terms of the Renyi entropy of the individual densities. 

Lemma 7. Let /i,..., f p be probability density functions on W 1 and qi,... ,q p > 
0 nonnegative numbers that sum to one. Let f be the mixture density 

p 

= xeB " 
t= 1 
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Then 


M/) > min h a {fi). 

l<£<p 

Proof. For 0 < a < 1 this follows by the concavity of Renyi entropy. Consider 
now a > 1: 

log J /“(x) dx = log J (j^q £ f £ (x)^J dx 

r p 

< log j £>/, a (x)dx 


t =1 


log f|> J /?(*) dx 


< 1(, g max J /f(x)dx 

= max iog / /“(x) dx, 


from which the claim follows because 1/(1 — a) is negative. Here the first 
inequality follows from the convexity of the mapping £ £ Q (for a > 1), 
and the second inequality follows by upper-bounding the average by the 
maximum. □ 


We next turn to upper bounds. 

Lemma 8. Consider the setup of Lemma\fQ 
1. If a > 1 then 


M/) < min { —^— log qe + h a (f £ )\. (33) 

l <£<p L 1 — a j 


2. If 0 < a < 1 then 


K{f) < --logp+ max h a (fi). 

1 — OL 1<£<P 


(34) 
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Proof. We begin with the case where a > 1. Since the densities and weights 
are nonnegative, 

(y^qefei*)) > V e {1, • • • ,p}- (35) 

'£=1 ' 

Integrating this inequality; taking logarithms, and dividing by 1 — a (which 
is negative) we obtain 

a 


K(S) < -log®. + KUi'), (' € {1,... ,p}. 


(36) 


1 — CD 

Since this holds for every t’ G {1, ...,p}, we can minimize over I' to ob¬ 
tain (1331) . 

We next turn to the case where 0 < a < 1. 

/ p ,. 

(^2 ®/«( x )) dx - lo S / max /"( x ) dx 
£=1 ^- P 

— log / £/?(x)dx 

£=1 

=[ /”( x > dx 
< log ^ max J /f (x)^ dx 

= logo + log max / /'f(x) dx 
l<£<p J 

= logo + max log / f? (x) dx. 

i<£< P y v 7 

Dividing this inequality by 1 — a (positive) yields (134|) . □ 


3.3 Bounded Densities 

Proposition 9. If a density f is bounded, and if a > 1, then h a (f ) > — oo. 

Proof. Let / be a density that is upper-bounded by the constant M (which 
must therefore be positive), and suppose that a > 1. In this case 

f a f x ) = f a -\x) f(x) 

< M" -1 f(x), 
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because £ n- £ Q 1 is monotonically increasing when a > 1. Integrating over 
x we obtain 

[ f a {x)dx < M"" 1 < oo. 


Since a > 1, this implies that 

1 


1 — a 


log 


f a {x) dx > —oo. 


□ 


The following proposition, which is proved in Appendix |AJ demonstrates 
that h* can be approached by bounded densities. 

Proposition 10. Suppose that T G (T 0 ,oo), where T 0 satisfies ffTTjl . Then 
for every 5 > 0 there exists some bounded density f* supported by S such 
that 

[ f*(x) r(x) dx < T + 5, (37a) 


h(n > h*(T) - S. 


(37b) 


3.4 The Marginals of the Uniform Density on 'pif ) n 

Old) 

Lemma 11. Let f* be a density on S having finite order-a Renyi entropy 

h a (D > -oo (38) 

for some 

a > 1 (39) 

and satisfying (1281) and (j30lh For every n G N, let (Ad,..., X n ) be drawn 
uniformly from the set where e is some fixed positive number. 

Then for every sufficiently large n the following holds: for any p E {1,... , n} 
the p-tuple (A 1; ..., X p ) has finite order-a Renyi entropy 

h a (Xi,... , X p ) > -oo, [p G {1,... ,n}, a > lj. (40) 

Proof. Denote the uniform density over T£(f*) fl Qf(f *) by f n , and let q n be 
the product density 


?n(x) = Yl f*( X k), X G S n . (41) 

k =1 
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Henceforth let n be sufficiently large for (j32|) to hold. Consequently, 

f n (x) < -—- 2~ n ^ n ~ £ \ x G S n . (42) 

Using this inequality and the definition in (129)) of 7^ £ (/*), we can upper-bound 
f n in terms of q n for tuples in 


/n(x) < 


->2 ne 


q n (x), x g 7; £ (/*). 


For every p G {1,..., n} we can obtain the density f n (x i,, x p ) of (Ad,. 
by integrating f n (x i, ...,x n ) over x p+ i, ...,x n : 

f n (x l,...,X p ) 

/n(x) I{x g 7d(r) n $*(/*)} dx p+1 • ■ ■ dz n 


(43) 

■ ■,*p) 


< 22nE f 9n(x) I{x g 7d(D n £ £ (r)} d* p+1 • • • d* n 

< Y37 9n(x) dXp +1 ■ • • dx n 

= 7^2 2n£ /*(a:i)---/*(x p ), xi,...,x p G <S, (44) 

1 — £ 


where I{•} denotes the indicator function, and the first inequality follows 
from (l43]h the second by increasing the range of integration; and the final 
equality follows from (141 p . 

Using (jUD we can now lower-bound h a ( A 1; ..., X p ) as follows. If a density 
/ is upper-bounded by Kg, where g is some other density and K is some 
positive constant, and if a > 1, then 

ha{J) = -^— kg [ /“(X)dx 

1-a J 

> —-— log / KY(x)dx 
1-a J 

= log K + h a (g) , (45) 

1 — a 

where the inequality holds because a > 1 so the pre-log is negative. Using 
this and (1441) we obtain 

h a {Xi ,..., x p ) > ^log( y37 22 ^ + P K{n 

> —oo. □ 
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4 Proofs of Theorems [2] and [3 


The following proposition is useful for stationarization. 

Proposition 12. Let f n be some density on S n having order-a Renyi entropy 
h a (fn) and satisfying 

n 

Y E[r(X fc )] < nT, {X u ..., X n ) ~ f n . (46) 

k=\ 

Then there exists a stationary SP {Z^} satisfying (HJ) for which the following 
holds: 

• U 


ha !;■■■) Up) i ha \H-n—p '+1 > • • ■ j H-n) ^ OO, 

p,p' G {l,...,n- 1}, (47) 

whenever (Xl, ..., X n ) ~ f n and p, p' G {1, ... ,n — 1}, then 

lim —h a (Zi,Z m ) > -ha(fn)- (48) 

771—>• OO ^ Tl 

• U 

ha \Z -!;■■■) Up) i ha \H-n—p '+1 j ■ • ■ j X n ) TOO, 

p,p'G {l,...,n™ 1}, (49) 

whenever (X 1; ...,X n ) ~ f n and p, p' G {1, ... ,n — 1}, t/ien 

lim — /i a (Zi,..., Zm) < -/i a (/ n ). (50) 

m —700 m n 

• zinc? if both (1471) and (jj9j) hold, then 

lim —h a (Zi,Z m ) = -h a {f n ). (51) 

oo 777 77 
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Proof. Consider first the (nonstationary) SP {Y*,} that we construct by draw¬ 
ing 

...,Y^ n+1 ,Y 1 n ,Y^ 1 ,...^im f n . 

To stationarize it, let T be drawn uniformly over {0,..., n— 1} independently 
of {Yk}, and define the stationary SP 

Zk — Yk+Ti k G Z. (52) 

It satisfies ©• Consider now any m larger than 2 n, and express Z™ in one 
of two different way depending on whether T is zero or not. For T — 0 


yrn _ yn yvn y y 

1 1 ? • ' * 5 1 vn—n+l ? 1 vn+l 1 m 

v ^ s_ _✓ 

v = \m/n\ n-tuples p = m- n\m/n J terms 


where 


m 



p — m — n 


m 
. n - 


e {0,... ,n - 1}. 


And for T e {1,..., n — 1} 


(53) 


(54a) 

(54b) 


z r = 


where 


Yt+ 1,..., Y„, y„ 2 ”,,.... Yppl )n , y ( „ +1)n+1 ,.... y„ +T (55) 

S -V-' ^^' 


p' = n — T terms 


v n-tuples 


p terms 


P =n-T e 1}, 

m — n + T 


v = 


n 


p = m — n + T — n 


m — n + T 


n 


G {0,...,n — 1}. 


(56a) 

(56b) 

(56c) 


Denote the density of Z™ by fz and its conditional density given T = t 
by fz\T=t- 

To establish (148|) we use Lemma [TJ which implies that 


h a ( fz ) > mm h a (fz\T=t) ■ 


(57) 
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To compute h a (f z \T=o) we use (153|) to obtain 


ha(fz\T=o) — — h a (f n ) + h a (x, X 

L n J 


h'a(fn) T ha(X i, . . . , X p) 

h a (fn) + 0 A min {h a (X u ...,X p )}. 

l<p<n—l 


(58) 

(59) 


m 

> — 


where the second term on the RHS of (158|) should be interpreted as zero 
when p is zero, and where a A b denotes the minimum of a and b. 

And to compute h Q (/z|T=t) for t G {1,..., n — 1} we use (|55|) to obtain 



ho( i /n.) T ... i Xp'j , 


(60) 


where p, p' are obtained from (156|) by substituting t for T, and the last term 
on the RHS should be interpreted as zero when p is zero. 

It thus follows from (1571) . (1591) . (1601) . and the above interpretation that 



K(fz) > min 


(61) 


The hrst two terms do not depend on m and are greater than — oo when¬ 
ever (1471 holds. Dividing (1611) by m and letting m tend to infinity (with n 
held fixed), establishes (l48lh 

To establish (|50|) we need an upper bound on h Q (/z). Such a bound 
can be obtained from Lemma [SJ The exact form of the bound depends on 
whether a exceeds 1 or not. But either form leads to (17>H upon dividing by 
m and letting it tend to infinity. 

To conclude the proof we note that fl5Tf follows from (150|) and (l48]h □ 

Proof of Theorem 0 Since h *(•) is continuous on the ray (T 0 , oo), and since 
T > T 0 by the theorem’s hypotheses, h*(-) is continuous at T. Consequently, 
we can find some T' for which 


r < r 

h*(T') > h*(T) - e. 
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(62a) 

(62b) 












These inequalities imply that we can find some 8 > 0 small enough so that 


T'+ 6 <T (63a) 

h*(T’)~8>h*{T)-£. (63b) 

By Proposition [TUI there exists some bounded density /* supported by S 
such that 

J f*(x) r(x) dx < T' + 8, (64a) 

h(f*) > h*{ r') - 5. (64b) 

Moreover, the boundedness of /*, the hypothesis that a > 1, and Proposi¬ 
tion [9] imply that 

M/*) > -oo. (64c) 

These inequalities combine with (lUUj) to imply 


J f*(x) r(x) dx < T 

h(n>h*(T)-e. 

We can hence choose e > 0 small enough so that 

j f*(x) r(x) dx < T — e 

h(f *) > h*(r)-i + e. 
Let f n be the uniform density over 

r:(nng^n- 


(65a) 

(65b) 


(66a) 

(66b) 


The cost of f n can be bounded by noting that its support is contained in 
££(/*)> and 

1 f 

X V 1 e Gnif*) => < / f*( X ) r ( X ) dx + £ 

n k= 1 ^ 

1 n 

=► -VV(x fc ) < r, 

n z —' 
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where the second implication follows from (I66a[) . Thus, 



n 

f n ( x )^r(xi) dx < nr. 

i —1 


( 67 ) 


To lower-bound its Renyi entropy, we note that by the LLN (in combina¬ 
tion with flbball l and the AEP (see Section IXIj) 

WiP) n ££(/*)! > (1 - e) 2< h ^~ e \ n large. (68) 


Consequently, 


h a (fn) > n(h(n - e) + log(l - e) n large, 
or, upon dividing by n, 

-h a {f n )>h{f *)-e + - log(l-e) (69) 

n n 

for all sufficiently large n. We now choose n large enough so that not only 
will (1691) hold but also its RHS satisfy 

h(f*) -e + - log(l - e) > h*{T) - e. 
n 

(This is possible by (166bjl . ) For this n we thus have 

-h a (fn)>h*(T)-£. (70) 

n 

The inequalities (170T) and (j57]) indicate that f n is a good candidate for 
the application of Proposition [T21 We hence proceed to check its hypotheses. 
By Lemma [III and (j64dl . if Ad,..., X n ~ f n then 

h a (Xi, ..., Xp) > -oo, p 6 {1,... ,n - 1}, (71) 

and, since f n is permutation invariant, we also infer 

h a (X n _ p > + 1 ,..., X n ) > -oo, p' G {1,.. ■, n - 1} (72) 

so g?D holds. And, since a > 1, it follows from (J5J) that gUl) also holds. We 
can thus apply Proposition [12] to conclude the proof. □ 
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Proof of Theorem O We first prove the theorem when |«S| = oo. We distin¬ 
guish between two cases. The first case, which is the case with which we 
begin, is when there exists some n € N and a density /* on Ad, ..., X n such 
that 

Pi[XieS] = l, E[r(W)]<r, ie{l,...,n} (73) 

and 

M^ 1 ,---,^n) = +oo. (74) 

To apply Proposition [12] to this density, we note that, since 0 < a < 1, 
Inequality (j3| implies (l47|h and the proposition thus guarantees the existence 
of a stationary SP {Z^j satisfying (pQ) and (|48|) so 

lim — h a (Zi ,..., Z m ) — +oo. (75) 

m— 700 777, 

This concludes the proof for the case at hand. 

We next turn to the second case where |<S| is still infinite, but any tuple 
whose components satisfy the constraints has Renyi entropy smaller than oo: 


(Pr[A7 G 5] = 1, E[r(W)] < P, i G {u u ..., u 2 }\ 

==>• (h a (X Vl ,..., X U2 ) < ooV (76) 


Since |«S| is infinite, it follows from Proposition Q] that h*(T) —y oo as 
r —> oo. Consequently, there exists some Ti such that 

h*( Ti) > M. (77) 


Since h* is monotonic, there is no loss in generality in assuming, 

as we shall, 

that 


r, > r. 

(78) 

Let e G (0,1) be small enough so that 


h*( Ti) > M + 3e 

(79) 

ToTs < r < r i — £. 

(80) 
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Let the densities /P and /O be within e of achieving /z*(T 0 ) and h*(Ti) in 
the sense that their support is contained in S and 


(jf f®{x) r(x) da; < T e , h(f&) > h*(T e ) - e^j , 

£€{0,1}. (81) 

For every n e N, define 

St = T‘U it> )ngUf m ), l e { 0 , 1 }. ( 82 ) 

It follows from the LLN and AEP that, for all sufficiently large n, 

|S*| > (1 - e) 2 nWW) ~ £ \ i 6 {0,1}. (83) 

Assume now that n is large enough for this to hold. Let 5 > 0 be small 
enough so that 

(1 — $) (F 0 + e) + 5 (r x + e) < T. (84) 

(Such a S can be found in view of (I5UD .) 

Consider now the mixture density 

fn( x i) = (1 - s)-^- IK e 5 0 } + 5-^-r I{x? e 5,}. (85) 

Pol Pi| 

Let X™ be of density f n . Using (1841) and an argument similar to the one 
leading to (RUT]) we obtain 

n 

J>[r(X fc )] <nT. (86) 

k= 1 

In fact, the permutation invariance of f n implies the stronger statement 


E[r(X fc )]<r, k = l,...,n. (87) 

We next lower-bound h a (X^). To this end, we hrst argue that the sets 
So and are disjoint. To see this, note that by the definition of the sets 

£n(/ (0) )> £n(/ {1) ) and b y dSU> 

l , n r 

x i e ^n(/ (0) ) => Tr(,)< / f {0) (x)r(x)dx + e 

n J 

1 n 

=> ~y^r(x k ) < T 0 + £, (88) 
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and 


1 

e ^n(/ W ) =► ~^2 r { X k) > 

1 k= 1 

1 " 

=>- - VV(> fc ) > r x - e, (89) 

n 

k =1 

From (I80T) . (]88|) . and (189]) we now conclude that Gn(f^) and Gn(f^) are 
disjoint and hence also S 0 and Si. 

Having established that S 0 and 5i are disjoint, we can now compute 
h a (fn ) directly to obtain: 

= —77~ -T l0g((l - 5) Q |5 0 r-“ + 

> 1 logirl^l 1 -"). (90) 

n(l — a) \ / 

From this, (f83j) , (IHTjl . and (179]) it now follows that we can find some suffi¬ 
ciently large n for which 

h a (X 7) . . 

V > M. 91 

71 

To apply Proposition |T2] we note that (1871) and (1761) imply that (1491) holds. 
And the fact that a E (0,1) implies by (J4]) that (1471) holds. Hence, by the 
proposition, there exists a stationary SP satisfying the constraints and whose 
Reny rate is n^ 1 h a (X^ L ) and thus exceeds M. This concludes the proof when 
|«S| = oo. 

The proof when |<S| < oo is very similar. In fact, it is a bit simpler because 
|<S| < oo implies (1761) . We begin the proof by noting that, since |5| < oo, 
Proposition H] implies that h*(T) —> log|5| as T —> oo. Consequently, there 
exists some T i such that 

/i*(r0 >log|SKe. (92) 

Replacing M with log|5| — e in the derivation that leads from (1771) to (19Tj) . 
we obtain a density /„, for which 

> log|S| - i. (93) 

n 

The result then follows from Proposition 1121 bv noting that the LHS of (149]) 
is upper bounded by nlog|iS| and by noting that (1471) holds by (J4]) because 

0 < a < 1 . □ 


J f ( ' 1 \x)r(x) dx — e 
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5 Proof of Theorem |6| 


Proof of Theorem 0 Recall the assumption that the (p + 1) x (p + 1) matrix 
whose Row-£ Column-m element is a\e_ m \ is positive dehnite. This implies 
m that there exist constants ci \,..., a p , a 2 and a pxp positive dehnite matrix 
K p such that the following holdsJU if the random p-vector (Wi_ p ,..., Wo) is 
of second-moment matrix K p (not necessarily centered) and if are 

independent of (Wi_ p , ..., Wo) with 

E [Zj\ =0, i 6 N, (94a) 

E[Z i Z j ]=a 2 I{i = j}, i,jeN, (94b) 

then the process dehned inductively via 

p 

Xi = y] aiXi_k + Z il i 6 N (95) 

k =1 

with the initialization 

(X 1 _ p ,...,X 0 ) = (W 1 _ p ,...,W 0 ) (96) 

satishes the constraints ([2]). 

(By Burg’s maximum entropy theorem [3J Theorem 12.6.1], of all stochas¬ 
tic processes satisfying ((2]) the one of highest Shannon rate is the p-th or¬ 
der Gauss-Markov process. It is obtained when (Wj_ p ,..., Wo) is a cen¬ 
tered Gaussian and (ZA are IID ~ Af(0,a 2 ). Its Shannon entropy rate is 
(1/2) log(27recr 2 ).) 

We hrst consider the case where a > 1. Let cp,..., a p , a 2 and K p be as 
above, and let e > 0 be arbitrarily small. By Proposition 0 there exists a SP 
{ Zi} such that (1941) holds and such that 

lim -h a (Zi, ...,Z n )>\ log(27recr 2 ) - e. (97) 

n—too n 2 

The matrix K p is positive dehnite, so by the spectral representation theorem 
we can find vectors wi,... , w p G M p and constants qi,... ,q p > 0 with qi + 
■ ■ ■ + q p — 1 such that 

p 

K p = ^2 qe w^wj. (98) 

i= i 

3 The Row-£ Column-m element of the matrix K p is a.u_ m y This matrix is thus the 
result of deleting the last column and last row of the (p + 1) x (p+1) matrix that we 
assumed was positive definite. 


21 



(The vectors are eigenvectors of K p , and the constants qi,... ,q p are the scaled 
eigenvalues of K p .) Draw the random vector W independently of {Z^j with 

Pr[W = w e \ = q e , 


so that, by (l98lh 


E[WW T ] = K p . 

Construct now the stochastic process {A 7 *} using (195]) initialized with (Xi _ p ,..., X 0 ) T 
being set to W. 

The resulting SP thus satisfies (J2]). We next study its Renyi rate. To that 
end, we study the Renyi entropy of the vector Xf. Let /x denote its density, 
and let /x| W£ denote its conditional density given W = w^, so 

v 

/x(x) = ^g f /x| W( (x), x e K n . 

1 =1 


Consequently, by Lemma [TJ 


M/x) > min M/x|wJ, 

1<1<P 


(99) 


and by Lemma [S] 

M/x) < Y^ logq£ + M/x|wj}- (100) 

We next study h Q: (/x| w J for any given f 6 {1,... ,p}. Recalling that W and 
{Zi} are independent, we conclude that, conditional on W = w^, the random 
variables Xi, ..., X n are generated inductively via (j95]) with the initialization 

(Xi_p, ..., A" 0 ) T = W£. 

Conditionally on W = w^, the random variables Xi,..., X n are thus an 
affine transformation of Zi,, Z n . The transformation is of unit Jacobian 
(because the partial-derivatives matrix has l’s on the diagonal and 0’s on the 
upper triangle), and thus 

M/x|wJ = h a (Z u ..., Z n ), l G p}. (101) 
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From this, (jUUj) . and (11001) it follows that 


h a (Z?) < h a (fx ) < min { logg^j + h a (Z^). 

i<e<p 11 — a J 

Dividing by n and using (197|) establishes the result. 

We next turn to the case 0 < a < 1. For every M > 0 arbitrarily large, 
we use Proposition 0 to construct {Zij as above but with 

lim -h a (Z 1 , ...,Z n )> M. 

n—> oo fl 

The proof continues as for the case where a exceeds one. □ 

6 Discussion 

6.1 On Theorem [2l 

As the following heuristic argument demonstrates, one has to walk a fine line 
in order to achieve the supremum promised in Theorem E To see why, let us 
focus on the case where h*(-) is strictly increasing and where there exist real 
constants Ao, Ai G R. for which the function f*(x) = exp (Ao + Air(x)) I{x G 
5} is a density achieving h*(T). For any other density g supported on S and 
satisfying 

J g(x)r(x)dx = T (102) 

we then have (as in the proof of [3:, Theorem 12.1.1]) 

h(g) = h(T) ~ D(g\\r) (103) 

= h*(T)-D(g\\r). (104) 

Using this and (fTT|) we thus obtain that if { Z *.} is a stationary SP and if 
fz is the density of Z x and 



/ fz(x) r(x) dx = T, 

(105) 


Js 


then 

h a ({Z k })<h*(r)-D(f z \\f*), a> 1. 

(106) 
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Thus, for h a ({Z k }) to be close to /i*(T), the density of Z\ must be “close” (in 
relative-entropy) to /*0 We can repeat this argument for the joint density 
of Z\ , Z 2 to infer that Z\ and Z 2 must be “nearly independent” with each 
being of density “nearly” /*. More generally, for every fixed m G N the 
joint density of Z\,.. ., Z m must be nearly of a product form. But, of course 
choosing {Z k } 1ID will not work, because this choice would lead to a Renyi 
rate equal to h a (f Zl ) , which is typically smaller than h{Z\ ) (see ([S]))- 

6.2 On Theorem [6] 

Theorem [H] has bearing on the spectral estimation problem, i.e., the problem 
of extrapolating the values of the autocovariance sequence from its first p + 
1 values. One approach is to choose the extrapolated sequence to be the 
autocovariance sequence of the stochastic process that—among all stochastic 
processes that have an autocovariance sequence that starts with these p + 1 
values—maximizes the Shannon rate, namely the p-th order Gauss-Markov 
process (Burg’s theorem). 

A different approach might be to choose some a > 1 and to replace 
the maximization of the Shannon rate with that of the order-ct Renyi rate. 
As we next argue, Theorem [6] shows that this would result in the same 
extrapolated sequence. Indeed, inspecting the proof of the theorem we see 
that the stochastic process {W} that we constructed, while not a Gauss- 
Markov process, has the same autocovariance sequence as the p-th order 
Gauss-Markov process that satisfies the constraints. And, for a > 1 the 
supremum can only be achieved by a stochastic process of this autocovariance 
sequence: for any other autocovariance function the Renyi rate is upper 
bounded by the Shannon rate (because a > 1), and the latter is upper 
bounded by the Shannon rate of the Gaussian process, which, unless the 
autocovariance sequence is that of the p-th order Gauss-Markov process, is 
strictly smaller than the supremum (Burg’s theorem). 

A Proof of Proposition [fO 

In this appendix we present two lemmas, which we then use to prove Propo¬ 
sition m on approaching h*(T) using bounded densities. 

4 We are ignoring here the fact that one might consider approaching the supremum 
with (11051) only being an inequality. 
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Lemma 13. Let f be a density supported by S for which h(f ) is defined; 


J f{x) \r(x)\ da: < oo; 


(107) 


and for which 


j f(x) r(x) da: < T 


(108) 


for some T E M. Then for every 5 > 0 there exists a density f that is 
bounded, supported by S, and that satisfies 


J f(x)r(x) Ax < r + ,5 


(109) 


and 

Kf) > h(f) - s. ( 110 ) 

Proof. Let 0 < £ < 1 be fixed (small), with its choice specified later. It 
follows from (I107|) and the MCT that there exists some Mi sufficiently large 
so that 

J (/fa) - (f(x) A j \r(x)\dx < e, 

where we recall that a Ab stands for rninja, b}. Since the density / integrates 
to 1, we can find some M 2 sufficiently large so that 


J (f(x) A M 2 ) da: 


> 1 


£. 


Define now 


M = max{l. M 1; M 2 }. 

(111) 

For this M we have: 


J (/(a:) A M) da: > 1 — e, 

(112a) 

j (^f(x) — (/( x) A M) j r(a:) da: < e, 

(112b) 

(f(x) > l) => (f(x) AM>l). 

(112c) 

Consider now the bounded density 


f(n) = -p(f( x ) A M ) 

(113a) 
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where 


P 


(, f(x ) A M) chc. 


(113b) 


Note that because f{x) A M is upper-bounded by fix), which integrates to 
one, and because of (II 12a)> 


1 -e< P < 1, 


(114) 


so 


(./!■>•) A M) < /(./■) < -j—- f /'i.rj A M). 


(115) 


Moreover, / is supported by S. 

Given 5 > 0 we next show that by choosing e sufficiently small we can 
guarantee that both (1109)1 and (II 10)1 hold. Be begin with the former. Starting 
with (I113ap we have 


fix) — (fix) — f(x) AM)] r(x) dx 


f(x) r(x) dx 

= [ C^( x ) ^ M) r(x) dir 

= [ 

= ^ J f{x) r{x) dx 

+ f [fix) ~ (f(x) A M) j (-r(x)) dx 

< + J 5 I [fix) - (fix) A M)) |r(x)| dx 
1 1 

~P T+ P e 

< r + -—r + 


1 — £ 


1 -£’ 


(116) 


where the first inequality follows from (11081) ; the second from (jl 12b)l : and 
the last from (11141) . 
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We next study h(f). Starting with the definition of /, 


= 'og/3 + i/(/WAM)lo g75 l^dx 

= 1 °^ +i eL n ,J f(x)AM)log mxM 
+ ^Iy W > 1 (/WAM)IOg 7(^M d1 ' 

By (11111) . f(x) A M = /(or) whenever /(x) < 1, so 


(117) 


L m J f(x)AM) ' og 

= / /(^)log7wwdx. 

JX-. f(x)< 1 / (■£/ 


dx 


(118) 


Since £ log£ 1 is decreasing for £ > 1, and since /(x) > 1 implies /(x) AM > 1 

(by «nm 


(/(x) A M) log 


/(x) A M 


>/(^) lo gy^y, (/fa) > l) 


and hence 


( :/W >i (/WAM)1 ° g WM 

> / /(x) log —— dx. 

JX: f(x)> 1 / fa) 


dx 


(119) 


Summing (II 18j) and (1119)) we obtain 


/(/(x) A M) log M dx > h(f). (120) 

Using this, (j 11 7[) . and (1114)) we conclude that 

h(f) = h(f), whenever h(f) = oo 
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and 


h(J) > log(l -e) + h(f) - Y^z\h(f)\, 

whenever \h(f)\ < oo. (121) 

And obviously h(f) > h(f ) whenever h(f ) = —oo . 

The result now follows by choosing e small enough to guarantee that the 
RHS of (11161) does not exceed T + 5 and—if h(f) is finite—that the RHS of 
(11211) exceeds h(f) — 5. □ 

The following lemma addresses the case where (1 1071) does not hold. 

Lemma 14. Let the density f supported by S be such that 


J f(x) r(x) dx = —oo (122) 

and h(f ) is defined and exceeds —oo 

h(f) > —oo. (123) 

Then there exists a sequence of densities {fk} supported by S for which 

/ fk{x) |r(x)| dx < oo, 


and 


lirn h(f k ) = h(f), 

rz—yoo 


lim / fk(x) r(x) dx = — oo. 

k—yoo I 


,+ A 


0} and r = max{—r, 0}, so r 

1 

= r + — r with 

j f{x) r~(x) dx = oo, 

(124a) 

J f{x) r + (x) dx < oo. 

(124b) 

V k = {x: r _ (x) < /c}. 

(125) 
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By the MCT 


lim / /(x) r + (x) dx = / /(x)r + (x)dx 


k —^oo 


'T> k 


< OO 


(126a) 


and 


Consequently, 


lim / /(x) r (x) I{x G T>k\ dx = cx). 

k —^oo / 


(126b) 


lim / /(x) r(x) dx = — oo. 


k— >-oo 


(127) 


iv. 


The lemma’s hypotheses guarantee that h(f) is defined and exceeds — oo. 
Consequently, 

h(f) = h + (f) - h~(f), 


with 

where, 


h (/) < oo, h + (f) < oo, 


(128) 


By the MCT 


and 


h+ U) - J f(x) log jr~- 1 {f(x) < 1} dx, 
h~(f)= f /(x)log/(x)I{/(x) > l}dx. 


/ f (x) log—-l{f(x) < l}dx t h + (f) 
'v k J \ x ) 


Iv k 


/(x)log/(x)I{/(x) > l}dxt h (/) 


so, upon subtracting (and recalling h (/) < oo) 


lim / /(x) log —- dx = h(f). 

k^oo l Vk /(x 


(129) 
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Define 


Pk = / f(x) dx. 

JT> k 


Pk < 1 

Pk t 1- 


Note that since / is a density, 
and (by the MCT) 

Consequently, 

0 < Pk < 1, k large. 

For every such sufficiently large k, define the density 

fk(x) = Pp l f{x) I{x G V k }. 

It is supported by S, and its entropy h{f k ) can be expressed as 


Kfk)= / fk(x) logy—dx 

J fk{x) 

— / fk(x) log—-dx 

W fk(x) 

= LjJ [x)log W) Ax 

= log/3 fc + — / f(x) log—-dx. 
Pk Jv k J{x) 


From this, fl 12911 . and (|130p we obtain 

lim h(f k ) = h(f). 

k—yoo 

And as to the expectation of r(x) under f k : 

J f k (x)r{x) dx 

1 


i fix) r(x) dx 

Pk Jv k 

= — / /(x) r + (x) dx—— / /(x)r _ (x) dx. 

Pk Jv k Pk Jv k 


(130) 

(131) 


(132) 
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The first term on the LHS is finite by (1 131 1) and (1124b|) . The second tends 
to —oo by (11301) and (11271) . Hence, 


Moreover, 


lim 

h —^oo 


fk(x) f(x) dx = — oo. 


(133) 


J f k (x) |r(x )|dx 

If 1 

= -o~ f( x ) r+ ( x ) dx + 

Pk Jv k Pk 

< — / f{x) r + (x) dx + k 

Pk J 

< oo, 



/(x) r (x) dx 


(134) 


where the first inequality follows from the nonnegativity of r + and from the 
definition of the set T>k (11251) . and the second inequality follows from (1124bl) 
and (11311) . 

The lemma now follows from (I134j) . (1132)) . and (11331) . □ 

Proof o f Proposition \7R Since T exceeds T 0 , it follows from (TlTl) that 


— oo < h*(T) < oo. 


(135) 


Let the density / nearly achieve h*(T) in the sense that it is supported by S 
and that 


j f(x)r(x)dx<T, and h(f)>h*(T) 


8 _ 

2 ' 


(136) 


By (11351) . (11361) . and the definition of h*(T), 


— oo < h(f) < oo. (137) 

If f f(x)\r(x)\ dx is finite, then the result follows directly from Lemma U3l 
It remains to prove the result when this integral is infinite. In this case 
f f(x) r(x) dx = —oo by (11361) (because T < oo). Using this, the finiteness 
of h(f) (11371) . and Lemma [14l we infer the existence of a density / that 
supported by S and for which 


J fix) |r(x)| dx < oo, 


(138a) 
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h{f) > h(f ) 


6 

2 ’ 


(138b) 


J f(x)r(x)dx<T. (138c) 

Applying Lemma fT3l to the density /, we conclude that there exists a bounded 
density /* that is supported by S and that satisfies 

h(f*) > h(f) — ^ and J f*(x) r(x) dx <T + 5 (139) 

and hence, in view of (I138jl and fj!36|) . 

h(f*) > h*(T) — 5 and j f*{x) r(x) dx < T + 8. (140) 

The existence of /* concludes the proof of the proposition for the case where 
J f(x) |r(a;)| dx is infinite. □ 
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